Automated Analysis of Fault-Tolerance in Distributed Systems
نویسندگان
چکیده
values alone capture too little information about relationships between concrete values. For example, consider a system containing a majority voter. The voter's outputs depend on equality relationships among its inputs. If two inputs both have abstract value N, denoting the natural numbers, there is no way to tell from this whether they are equal. To more accurately track relationships between values, we introduce a set SVal of symbolic values, which are expressions composed of constants and variables. A constant represents the same value in every concrete run of the system; for example, a constant maj might represent a majority function. A variable represents values that may be di erent in di erent concrete runs of the system. Variables are useful for modeling outputs that are not completely determined by a component's inputs. Such outputs commonly arise with components that interact with an environment that is not modeled explicitly; they also arise when a component's behavior is approximated. Each variable is associated with (\local to") a single component, whose behavior in a given concrete run determines the value of that variable. This allows independent proofs that each I/O function represents the behavior of the corresponding process. Roughly, the reason is that runs do not have canonical forms [Sto97]. For convenience, we include in SVal a special wildcard symbol \ ", which can always represent any value. Finally, we allow a value to contain a set of possibilities; thus, we de ne Val = Set(SVal AVal) n f;g; (9) where Set(S) is the powerset of a set S. Note that I/O functions incorporate a form of symbolic computation, since their inputs and outputs contain symbolic values. Notation. Since abstract values are analogous to types, we sometimes write hs; ai 2 SVal AVal as s :a. We often omit braces around singleton sets; for example, fhX;Nig 2 Val may be written X :N. We sometimes elide the wildcard; thus, fh ;Nig 2 Val may be written N. Example. Consider a two-stage replicated pipeline. The system contains a source S, which sends a value to three components F1; F2; F3, which each apply a function represented by the constant F to their input and send the result to the next stage in the pipeline. The components G1; G2; G3 in the next stage each apply a function represented by the constant G to their input and send the result to a voter. The 3-way voter V waits for an input from each Gi, applies a 3-way majority function, represented by the constant maj , to those inputs, and sends the result to an actuator A. More precisely, maj represents any function of 3 arguments that, when any two of its arguments are equal, returns that repeated value. A run representing the behavior of this system appears in Figure 1. Here, X is a local variable of the source. The run is obtained as a xed-point from I/O functions for the components. Due to space limitations, we only discuss the I/O function for the voter. Brie y, if the voter receives 3 inputs containing symbolic values s1; s2; s3, then in the general case, the voter's output contains the symbolic value maj (s1; s2; s3). If, in addition, two inputs contain the same symbolic value s (other than the wildcard), then a step of symbolic simpli cation can be done, yielding the symbolic value s. For example, maj (X;X; Y ) simpli es to X . Multiplicities. Multiplicities (i.e., numbers of messages) also need to be approximated. Uncertainty in the number of messages sent during a computation may stem from various sources, including non-determinism of components (especially faulty
منابع مشابه
Automated Stream-Based Analysis of Fault-Tolerance
A rigorous, automated approach to analyzing fault-tolerance of distributed systems is presented. The method is based on a stream model of computation that incorporates approximation mechanisms. One application is described: a protocol for fault-tolerant moving agents.
متن کاملA Genetic Based Resource Management Algorithm Considering Energy Efficiency in Cloud Computing Systems
Cloud computing is a result of the continuing progress made in the areas of hardware, technologies related to the Internet, distributed computing and automated management. The Increasing demand has led to an increase in services resulting in the establishment of large-scale computing and data centers, in addition to high operating costs and huge amounts of electrical power consumption. Insuffic...
متن کاملAn approach to fault detection and correction in design of systems using of Turbo codes
We present an approach to design of fault tolerant computing systems. In this paper, a technique is employed that enable the combination of several codes, in order to obtain flexibility in the design of error correcting codes. Code combining techniques are very effective, which one of these codes are turbo codes. The Algorithm-based fault tolerance techniques that to detect errors rely on the c...
متن کاملMulti-Layer Fault Tolerance for Distributed Real-Time Systems
This thesis addresses issues in building fault-tolerant distributed real-time systems. Such systems are increasingly deployed in automotive and avionics applications. We focus on the design and validation of fault tolerance mechanisms. From the design viewpoint, we develop the notion of multi-layer fault tolerance. A fault-tolerant distributed system contains a set of mechanisms that provide er...
متن کاملInfluence of Fault Current Limiter in Voltage Drop and TRV Considering Wind Farm
Influence of distributed generation systems in the distribution systems can increase the level of short-circuit current. The effectiveness of distributed generation systems is affected by the size, location, type of distributed generation systems technology, and the methods of connecting to distribution systems. Wind turbine system is the examples of distributed generation source. Not only does...
متن کاملSelf healing distributed systems
The growing complexity of distributed systems demands for new ways of control. This work addresses self-healing in distributed environments. The term self-healing represents a quite new area of research and is used in a fairly broad way, but can be seen as dynamic fault tolerance. This work proposes generic concepts and algorithms to build self-healing systems. The detection of node failures in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Formal Methods in System Design
دوره 26 شماره
صفحات -
تاریخ انتشار 2005